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Abstract 

We show that in order to approximate a 2-dimensional section of the 
equidensity ellipsoid of a d-dimensional gaussian random vector, one needs 
at least a number of samples proportional to d. Furthermore, we show 
that with n <C d samples, the hypothesis that two given coordinates are 
fully correlated, when all other coordinates are conditioned to be zero, 
cannot be told apart from the hypothesis that the two are uncorrelated. 



1 Introduction 

The problem of estimating the population covariance matrix given a sample of 
n i.i.d. observations X%, ...,X n in R d has been extensively studied. Estimation 
of covariance matrices plays a key role in many data analysis techniques (e.g. 
in principal component analysis, discriminant analysis, graphical models). 

It has been shown in |ALPT| . that the empirical covariance matrix gives a 
good approximation when n = Q(d). In the case n < d, it is clear that the em- 
pirical covariance matrix cannot give a good approximation for the population 
covariance matrix, since it is not of full rank. However, apriori, we could hope 
that other approximation schemes may still work. Later in this note, we will 
sec that it is not the case. 



An easier goal than approximating the entire convariance matrix A would be 
to approximate a single entry in A -1 . The latter has a rather natural interpre- 
tation: Given a multivariate gaussian random vector Y = (Yi, Yd), and two 
indices define 

aij = limMY^. | \Y k \ < e,Vk £ {i,j}} (1) 
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one may interpret the quantity ctij as the effective correlation between Yi and 
Yj , in the sense that it neutralizes the effect of correlation with a third variable 
Yk, k ^ {hi}- Now, it is easily seen that there is a simple relation the numbers 
a.i.j and the matrix A -1 , namely, 

As an example, if the indices represent a set of genes, and the quantity Yi rep- 
resents presence or absence of the i th gene, biologists are often interested to 
know whether or not a certain correlation between the presence of two different 
genes is due to the fact that both genes depend on a third gene. The number 
OLij gives an estimate to whether these two genes are directly correlated, rather 
than being both correlated with a third gene. 

The goal of this short note is to introduce an information-theoretic lower bound 
for the above question, and show that the number of samples needed in order 
to estimate the numbers a^j is essentially the same as the minimum number of 
samples needed to estimate the entire population covariance matrix using the 
empirical covariance matrix. 

Before we formulate the result, let us introduce some notation. Fix a dimension 
d, and consider the Euclidean space R d , and its standard basis ex, e^. Denote 
by B be the unit euclidean ball and define E = span{e\, ei\. 
For a symmetric matrix A 6 GL(d), define Ce(A) to be the covariance matrix 
of the uniform distribution on the 2-dimensional ellipse AD n E (the entries of 
the matrix Ce{A) will be the numbers oi,i, 0:1,2, 0:2,2 defined in IfT])). 

Let Xi,...,X n independent samples of the standard gaussian distribution in 
M. d . We prove the following theorem: 

Theorem 1.1 Suppose n < ^. There does not exist a function F satisfying, 

F({F(AXx,...,AX n ) = rank(C E (A))}) > 0.9 (2) 
for all A £ GL(n). 

In other words, given | samples or less, not only we cannot approximate the 
constants otij, but we cannot even determine the rank of the matrix Ce{A) 
with a reasonable probability. 

The idea of the proof is the following: we construct two multivariate gaussian 
random vectors X, Y with covariance matrices Ax , Ay such that rankCE (Ax ) 7^ 
rankCE(Ay), while the total variation distance between two sequences of n sam- 
ples from X and Y is rather small. A small total variation distance implies that 
for every function F, the total variation distance between the random variables 
F(Xi, X n ) and F(Yi, Y n ) will be rather small, which means that F we 



*3i* a 3,3 
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cannot distinguish between the two. 



It is interesting to inspect the result of this note in view of some positive 
results concerning the estimation of the covariance matrix which appeared re- 
cently. The results provide methods to approximate the covariance matrix and 
its inverse when some extra assumptions about the distribution of X can be 
made. For example, when the covariance matrix is assumed to be rather sparse, 
some methods can be used in order to estimate the inverse matrix given a 
rather small number of samples. See for example |BLRZ] . jV], [LVj and refer- 
ences therein. 

Acknowledgements The author would like to thank Bo'az Klartag and Ro- 
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2 Proof of the theorem 

To prove the theorem, we assume by contradiction that there exists a function 
F : (R d ) n -> {0, 1, 2} satisfying ©. 

We begin with the construction of two gaussian vectors in M. d : 
Let Xi, ...,X n> Yi, ...,Y n be independent samples of a standard gaussian vec- 
tor, and let let 8 be a random variable uniformly distributed on S™ -1 . Define 
Yi = Proj s ±Yi. Clearly, Y\,...,Y n are independent samples of some (random) 
distribution. Moreover, since (Yi-,0) = 0, it is clear that Ce(A) is of rank 1 
whenever 9 £ E . 

Our first step is showing, under the assumption of the existence of F, that 
there also exists a function G satisfying Q which is invariant under the action 
of SO(n). 

To this end, let T be a random orthogonal matrix distributed uniformly accord- 
ing to the haar measure on SO(n). The rotation invariance of the sequences 
and @ imply, 

P(F(T(X 1 ),...,T(X n ))=2)>0.9, ¥(F(T(Yi), T(Y n )) = 1) > 0.9 (3) 
Therefore, denoting 

(2 ,E T (F(T(Z 1 ),...,T(Z n )))>l 
G(Z 1 ,...,Z n )= ll , ±<E T (F(T(Z 1 ),...,T(Z n )))<l 
[0 , otherwise 

it is easily checked that G will satisfy: 

P(G(T(X 1 ),...,T(X n )) = 2) >0.8, P(G(T(r 1 ),...,T(r„)) = l)>0.8. (4) 
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The total variation distance between two random variables X, Y with values in 
W is defined as 

d TV [X, Y ) = sup |P(X e A) - p(y e A) I . 

AcW 

Equation ([4| implies that, 

dTv{G(X u ...,X n ),G(Y u ...,Y n )) >0.6. (5) 

Since G is invariant under rotations, and since one can always choose an or- 
thogonal transformation T such that, 

T(Xi) e span{ei,...,ei}, \/\<i<d, 

it is clear that the function G must only depend on the Gram matrix of the 
samples. So, 

d TV {G{X u ...,X n ),G{Y 1 ,...,Y n ))<d TV {Gr{X u ...,X n ),Gr{Y u ...,Y n )) 

Where Gr(-) denotes the Gram matrix. 
Clearly, 

Gr(X x ,.,.,X n ) ~ W n {Id,d), Gr(X u ...,X n ) ~W n (Id,d-l), 

where W n {C,p) is the Wishart distribution of dimension n with p degrees of 
freedom and covariance matrix C. Our task is therefore to estimate, 

d TV (W n (Id,d-l),W n {Id,d)) 

(where the above random matrices are independent). 

It is well known that a random matrix A ~ W n (Id, d) has the following density 
with respect to the lebesgue measure on M. d : 

det(A)3(f-«- 1 ) e xp(-|Trace(7l)) 

fn,p{A) 



2-WWn-l) H» =l r(| (p + 1 - i)) 

Denote the measure expressing the law of A by p, n , P - We would like to estimate 
the total variation metric between p, n ^ and p, nt d-i- For this, we write, 

d TV {W n (Id,d-l),W n (Id,d)) = i J \f n ,d-l(A) - f ntd (A)\dX(A) = 

(where A is the lebesgue measure on M™ ) 

det(A) 1 / 2 



J detiAy/^^A) 



dp, nid -i{A) < 



1 ff det(A)V2 \ 2 

2V7 I' fdet(A)V^ n . d _J ^ d - 1 
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Let X be a random variable such that E[|A| 4 ] exists. It follows from Liapunov's 
theorem that 



E[LY| 4 ] (n\X\ 2 V' 



n\x\\ - \ e[\x\] 

So, 

E[|A| 4 ]-E[|A|T>|j^-E[|A| 2 ] 2 

And so, 

Var[\X\ 2 } Var[\X\] 



E[\X\*\ ~ E[\X\]* 
It follows that, 



As shown in [DMOj . Theorem 4.4 that one has, 

(d-l)l / ((d-l) + 2)! (d-1)! 



7ar[det(W„(Id,d- 1))] = 



(d - 1 - n)! V((d - 1) + 2 - n)! ((d - 1) - n)! 
and, 

E[det(W„ (Id, d - 1)] = 

(d — 1 — n)! 

So, 



dM w n (id, d -i),w n (id, d )) < W gti-^dii!; - 1 = 



1 / (d)(d + l) 



21/ (d-rc)(d-n+l) 



The above expression is clearly smaller than 0.6 whenever n < |. This contra- 
dicts J5]) and the proof is finished. □ 



Remark 2.1 /< is easy to seen that when n -C d, i/ie function F cannot do 
much better than being correct with probability ^, hence, it cannot do better 
than guessing the rank ofCE(A). 

Remark 2.2 Following the same lines of proof one can also show that the cor- 
relation between two coordinates cannot be approximated also when conditioning 
on all but k coordinates to be zero, whenever k is a small enough. 
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